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SUMMARY 

Consider  a  system  S  specified  at  any  time  t  by  a  finite 

dimensional  vector  x(t)  satisfying  a  vector  differential 

equation  dx/dt  *  g(x,r(t),f(t)),  x(0)  ■  c,  where  c  Is  the 

Initial  state,  r(t)  lc  a  random  forcing  term  possessing  a 

known  distribution,  and  f(t)  Is  a  forcing  term  chosen,  via  a 

feedback  process,  so  as  to  minimize  the  expected  value  of  a 

functional  J(x)  «  / n T  h(x  -  y,t)dG(t),  where  y(t)  is  a 

C 

known  function,  or  chosen  so  as  to  minimize  the  functional 

defined  by  the  probability  that  Max  h(x  -  y,t)  exceed  a 

0<t<T 

specified  bound. 

It  Is  shov/n  how  the  functional  equation  technique  of 
dynamic  programming  may  be  used  to  obtain  a  new  computational 
and  analytic  approach  to  problems  of  this  genre.  The  limited 
memory  capacity  of  present-day  digital  computers  limits  the 
successful  application  of  these  techniques  to  first  and  second 
order  systems  at  the  moment,  with  limited  application  to  higher 
order  systems. 
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DYNAMIC  PROGRAMMING  AND  STOCHASTIC  CONTROL  PROCESSES 

Richard  Bellman 


1.  Introduction 

In  this  paper,  we  wish  to  Indicate  the  application  of  the 
functional  equation  techniques  of  the  theory  of  dynamic  pro¬ 
gramming  to  the  formulation  and  computational  solution  of 
various  types  of  variational  proclems  arising  In  the  Btudy  of 
control  processes  with  stochastic  elements.  Although  the 
methods  displayed  below  are  Intimately  related  to  those  we  have 
previously  presented  In  connection  with  deterministic  control 
processes;  of.  [l]  ,  [2],  [3],  as  might  le  expected,  the  presence 
of  stochastic  effects  Introduces  new  difficulties  of  both 
conceptual  and  analytic  nature  which  must  te  carefully  examined. 

A  fundamental  problem,  arising  in  numerous  applications, 
is  that  of  determining  feedback  control  which  will  neutralize 
random  disturbances.  These  disturbing  influences  are  usually 
called  "noise." 

Here  we  shall  consider  the  following  oartlcular  version  of 
this  general  question.  Let  S  be  a  physical  system,  specified 
at  any  time  t  by  a  finite  dimensional  vector  x(t).  This 
vector  Is  determined  as  a  function  of  tine,  and  the  initial 
state  of  the  system,  by  means  of  tie  differential  equation 

(l)  -  s(x ,r( t ) ) ,  x(0)  «  c. 

The  function  r(t)  appearing  on  the  right  is  a  random  function 
of  time  with  known  nroperties. 
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We  shall  not  discuss  here  the  far  more  difficult  questions 
which  arise  from  the  study  of  processes  In  which  r(t)  is  only 
imperfectly  known  initially,  and  is  then  determined  more  and 
more  accurately  as  the  process  continues.  The  reader  interested 
in  these  matters  will  find  discussions  of  this  type  of  problems 
and  further  references  In  Robbins,  (15]  ,  and  Bellman  and  Kalaba, 

A  particularly  important  case,  from  the  standpoint  of  both 
analysis  and  application,  is  that  where  g(x,r(t))  1b  linear 
in  both  x  and  r(t).  The  equation  in  (l)  then  has  the  simple 
form 

(2)  -  AX  +  r(t),  x(0)  ■  c. 

A  rigorous  formulation  of  the  theory  of  nonlinear  differential 
equations  with  stochastic  elements  presents  certain  difficulties 
which  we  shall  not  enter  into  here  for  reasons  we  shall  detail 
below.  The  linear  equation,  however,  has  been  treated  at  great 
length  in  a  numrer  of  capers  in  full  rigor;  cf.  Doob,  |l4]  ;  see, 
also,  [a],  and  the  recent  papers  of  Booton,  J2|  ,  53*  Equations 
of  the  form  dx/dt  *  (A  ♦  R(t))x,  where  R(t)  is  a  random 
matrix  can  also  be  treated  in  some  detail. 

We  are  primarily  Interested  here  in  the  case  where 
g(x,r(t))  Is  nonlinear,  or  where  other  nonlinearities  arise, 
in  a  fast  Ion  we  shall  discuss  below,  to  a  sufficient  degree  to 
destroy  any  hope  of  using  explicit  analytic  solutions  to 
resolve  control  problems. 
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To  counteract  the  influence  of  r(t),  and  simultaneously 
to  direct  the  unperturbed  system  along  more  desirable  lines,  we 
Introduce  "feedback  control"  in  the  form  of  a  vector  function 
v(t).  The  defining  function  now  has  the  form 

(3)  *  g(x,r(t),  v(t)),  x(0)  «  c, 

where  v(t)  is  a  function  of  the  state  of  the  system  at  tine 
t  and  the  time  t  itself,  i.e.,  v(t)  ■  v(x(t),t). 

Let  us  denote  by  y(t)  the  solution  of  the  unperturbed- 
uncontrolled  equation 

(M  -  g(y),  y(o)  -  c. 

In  some  cases,  we  may  wish  to  keen  x  close  to  y  over  the 
time  interval  (0,T).  We  agree  then  to  measure  the  deviation 
from  y  by  means  of  a  functional  of  the  form, 

(5)  J(v)  -</>T  h(x  -  y)dO(t), 

0 

where  h(z)  is  a  scalar  function  of  the  vector  z.  Ry  intro¬ 
ducing  a  step  discontinuity  at  t  ■  T,  we  can  com!  ine  deviation 
over  the  interval  with  terminal  control. 

At  other  times,  the  function  y  n°ed  not  re  a  solution  of 
the  unperturbed  system,  but  merely  a  desiral  le  state  of  the 
system.  In  both  cases,  we  see  that  we  wish  to  determine  the 
control  vector  v(t)  so  as  to  minimize  a  prescribed  functional 
of  x  and  v  which  can  be  written 
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(6)  J(v)  -  /T  h(x,v,r)dQ(t). 

Since  the  functional  itself  will  ie,  in  general,  a  stochastic 
quantity,  in  order  to  make  this  statement  precise  we  must  first 
average  J(v),  in  some  suitable  fashion,  over  the  class  of 
random  functions  which,  occur.  The  problem  we  wish  to  consider 
Is  that  of  minimizing  this  expected  value  of  a  function  of  J(v), 
subject  to  constraints  on  v(t). 

A  rigorous  formulation  of  variational  problems  involving 
stochastic  functions  is  again  a  matter  of  some  difficulty.  We 
shall  avoid  both  this  difficulty,  and  the  one  mentioned  concerning 
the  meaning  of  stochastic  differential  equations  by  considering 
only  discrete  control  processes.  In  this  way,  we  replace 
differential  equations  by  difference  equations,  integrals  by 
sums,  and  stochastic  functions  by  stochastic  sequences.  The 
reason  for  thi3  change  in  format  lies  not  so  much  in  our  desire 
to  avoid  occasionally  unpleasant  rigorous  details,  as  in  our 
desire  to  prepare  the  nroblem  for  solution  by  means  of  a  digital 
compucer . 

Nothing  for  nothing,  however!  It  is  now  a  matter  of  some 
significance  to  study  the  connection  between  the  original 
continuous  process  and  the  approximating  discrete  process.  Not 
only  is  it  important  to  know  whether  or  not  the  respective 
minimum  values  are  close,  but  it  Is  also  important  to  know 
whether  the  corresponding  policies  i  ear  any  similarity.  Further¬ 
more,  the  rate  of  convergence  of  the  discrete  process  to  the 
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continuous  process  must  be  studied.  This  is  critically  dependent 
upon  the  type  of  discrete  approximation  which  is  employed.  Some 
preliminary  results  in  these  directions  may  be  found  in  [l]  and 
00. 

It  should  constantly  be  kept  in  mind  that  both  continuous 
and  discrete  processes  are  approximations  to  the  actual  physical 
process.  The  important  point  is  not  so  much  their  similarity  to 
each  other  as  the  value  of  either  mathematical  model  in  treating 
the  actual  control  process. 

We  ohall  first  apply  the  functional  equation  technique  to 
the  general  variational  problem  posed  above.  Then,  as  a  simple 
example,  we  shall  discuss  its  specific  application  to  the  problem 
of  determining  the  scalar  function  v(t)  in  such  a  way  as  to 
minimize  the  expected  value  of  the  functional 

(7)  /’T  u2dt  +  |u(T)| 

where  u  is  the  solution  of  the  Van  der  Pol  equation  with  the 
forcing  term3  r(t)  and  v(t), 

(8)  u"  +  x(u2  -  l)u»  +  u  -  r(t)  +  v(t), 
u(0)  -  Cj,  u'(0)  -  c p . 

To  show  the  versatility  of  the  method,  we  shall  then  show 
how  to  treat  by  means  of  recurrence  relations  the  problem  of 
minimizing  the  probability  that  J^(v)  >  d,  where 

(9)  J.  (v)  -  Max  !  ) x  —  y I  !  . 

1  0<t<T 
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Here  |  |z|  |  Is  the  norm  of  z  defined  In  one  of  the  usual  ways. 
A  treatment  of  the  deterministic  version  of  this  problem  may  be 
found  In  [5]  . 

Finally,  we  shall  discuss  a  case  In  which  the  random 
function  r(t)  possesses  a  correlation  with  the  value  of 
r(t  -  A).  Here  t  assumes  only  the  values  A,  ?A,  ...  • 

As  a  subsequent  discussion  of  the  specific  equation 
mentioned  above  will  show,  the  functional  equation  technique 
of  dynamic  programming  furnishes  a  feasible  computational  solu¬ 
tion  for  second  order  systems,  without  regard  to  the  analytic 
character  of  either  the  equation  or  the  criterion  function, 

.T(v).  Although  equations  of  higher  order  cannot  be  treated  at 
the  moment  by  means  of  the  same  straightforward  approach,  more 
refined  analytic  and  computational  techniques  recently  developed 
appear  to  offer  an  approach  to  the  successful  treatment  of 
control  problems  for  higher  dimensional  systems;  see  p]  ,  [7j  . 

2.  Feedback  Control  as  a  Multistage  Decision  Process 

Let  us  now  see  how  we  can  interpret  feedback  control  as  a 
multistage  decision  process. 

To  l eg In  with,  we  observe  c,  the  initial  state  of  the 
system,  and  make  an  initial  Choice  of  a  control  vector,  v(0). 

As  a  result  of  the  initial  random  effect,  r(0),  we  find  our¬ 
selves  at  time  &  in  a  new  state  c',  determined  by  the 
equations  cjoveming  the  system,  required  to  make  a  new  choice  of 
a  control  vector.  This  situation  repeats  lt3elf  at  times 
?A,  and  so  on. 
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The  salient  fact  that  enables  us  to  break  this  complex 
process  down  into  a  sequence  of  simple  processes  is  the  dependence 
of  the  future  upon  the  present,  and  not  upon  the  past,  or  upon 
how  the  past  became  the  present.  Starting  from  any  state  at  any 
time,  say  t^,  we  exert  control  in  such  a  way  as  to  minimize  the 
deviation  from  that  time  t^  until  the  process  ends.  Whatever 
deviation  has  occurred  in  the  past  does  affect  the  total  cost  of 
deviation  of  the  system  as  measured,  say,  by  the  integral  in 
(1.6),  but  does  not  affect  the  sequence  of  choices  we  make  from 
the  time  t^  on.  Thi3  sequence  of  choices  depends  only  upon 
the  state  of  the  system  at  this  particular  time  tQ  and  the 
behavior  of  the  stochastic  vector  r(t)  from  tQ  on. 

This  statement  which  perhaps  appears  paradoxical  at  first 
glance,  and  is  certainly  rather  difficult  to  express  verlally, 
is  a  simple  consequence  of  the  additivity  of  integrals,  i.e.. 


a) 


I 


and  the  fact  that  the  solution  of  a  differential  equation  of 
the  form  given  in  (1.5)  is  for  t  >  trj  dependent  only  upon  its 
value  at  t0  and  the  values  of  r(t)  for  t  >  t^. 

Let  us  call  a  policy  any  choice  of  v(t)  subject  to  the 
constraints  Imposed,  and  an  optimal  policy  a  policy  which 
minimizes  the  nrescrlbed  criterion  function.  Then  the  remarks 
we  have  made  above  concerning  the  independence  of  future  be¬ 
havior  from  the  past  history  of  the  process  are  particular  con¬ 
sequences  of  what  we  have  called  the 
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# 

.? 

4 

4 
4 

Principle  of  Optimality:  An  optimal  policy  has  the  property  that 

1  m  "  " ““ 

whatever  the  Initial  state  and  Initial  decision  are,  the  remaining 
decisions  must  constitute  an  optimal  policy  with  regard  to  the 
state  resulting  from  the  first  decision. 

The  analytic  translation  of  this  statement  yields 
functional  equations  that  lead  to  a  computational  solution  of  the 
control  process  described  above.  See  [l]  for  further  discussion 
and  applications. 

Finally,  let  us  note  In  passing  that,  as  we  have  discussed 
elsewhere,  [l]  ,  [2],  [3]  ,  not  only  can  the  variational  problems 
derived  from  the  study  of  control  processes  be  considered  to  be 
multistage  decision  processes,  ^ut  actually  the  wider  discipline 
of  the  calculus  of  variations  Itself  can  be  considered  to  be 
part  of  the  general  theory  of  multistage  decision  processes  of 
continuous  type. 

3.  Discrete  Versions  of  Control  Processes 

Let  us  now  prepare  the  way  for  the  use  of  digital  computers. 
We  begin  by  replacing  the  continuous  process  described  in  ^1  by  a 
discrete  process.  The  interval  [0,T]  is  divided  into  N  part3 
of  length  A,  so  that  NA  *  T,  and  t  Is  allowed  to  assume  only 
the  values  0,  A,  2A,  N.  To  simplify  the  notation,  let  us 

write 

(1)  x(kA)  -  xk,  r ( k A)  «  rk,  v(kA)  -  vR, 

and  replace  the  differential  equation  of  (1.3)  by  the  difference 
equation 
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<2>  xk+i  -  \  ■  e(\-’rk'vk)a>  xo'c- 

There  Is  now  no  difficulty  as  to  what  we  mean  by  a  stochastic 
sequence  of  values  as  generated  Ly  the  difference 

equation  In  (2).  The  random  sequence  of  vectors  !r^  con¬ 
stitute  a  much  more  prosaic  set  than  the  set  of  values  assumed 
by  a  random  function  r(t),  and  one  much  easier  to  contemplate. 

In  place  of  choosing  a  function  v(t)  which  minimizes  the 
expected  values  of  a  functional,  we  wish  to  choose  a  sequence  of 
vectors  (v^J  which  minimize  the  expected  value  of  a  function, 

,  .  N— 1 

<’>  J<  fVk)  >  •  J0,'(xk'rk'Vk)  +  m<XN^ 

This  Is  a  well-formulated  Droblem  with  no  conceptual  loose  ends. 

In  the  next  section,  we  shall  devote  our  energies  to  showing 
how  the  functional  equation  technique  of  dynamic  programming  may 
be  applied  to  the  problem  posed  In  the  foregoing  lines. 

4.  Functional  Equations 

Consider  t hie  problem  of  minimizing  the  expected  value  of 

N— 1 

(1)  J(  (vkj  ;a)  «  2  h(xk'rk'v,J  4 

over  sequences  fv,  ,  k  «  a ,  a  +  1 ,  . . .  ,  N  —  ]  ,  where  a  is 

l  K 

one  of  tie  quantities  0,  1 ,  2,  .  . . ,  N  —  1 .  A3  in  ^5 ,  xk+^  is 
determined  by  the  relation 

<?)  xk+l  -  x!<  ■  S(xk'rk’vk>'  k  >  a-  xa  “  c ' 
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It  is  clear  that  the  minimum  expected  value  of  J(  ;a) 
depends  uoon  c,  the  state  at  time  a,  and  upon  a  itself. 
Let  us  then  define  the  function 

(3)  fa(c)  -  Min  Exp  J(  jvkj  ;a), 

where  the  minimum  is  over  all  policies  P.  The  function  is 
defined  for  all  c,  and  for  a  *  0,  1,  . N  —  1. 

We  see  that 

(4)  f n_i ( c  )  *  Min  Exp  h(xN_1,rN_i,vN_i)  +  r.(xN)  , 

VN  rN 


where 

(  r  )  ■  c  +  8  ( c  *  » v^_i  )  * 


The  principle  of  optimality,  stated  in  ^2,  yields  the 
recurrence  relation 


h) 


fa(.) 


Min 

v 


Sxp  h(c,ra,va)  +  fa+1(c  +  g(c,ra,va))j 

L  1  a  ‘ 


Since  fN_}(c)  is  determined  ty  ( ^) ,  the  relation  in  (6) 
enables  us  to  comnute  f^  «(c),  and  s.o,  step— by— step,  eventually, 
f0(c) . 

3.  An  Example 

I-et  us  now  apply  these  techniques  to  a  particular  example. 


Consider  tlie  Van  der  Pol  equation  with  a  forcing  terra, 
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(1)  u"  *  x(u2  -  l)u*  +  u  «  r(t)  ♦  v(t), 

u(0)  -  c1#  u'(0)  -  c2, 

where  the  behavior  of  the  random  function  r(t)  will  be  speci¬ 
fied  precisely  below,  and  where  it  ic  desired  to  determine  to 
choose  v(t),  subject  to  the  constraint 

(2)  -  a  <  v(t)  <  a, 

so  as  to  minimize  the  expected  value  of 

(?)  J(v)  -  /'T  u2dt  +  |u(T)|. 

% 


In  place  of  the  second  order  equation  in  (1),  we  consider 
the  system 


du 

cf£ 


w,  u(0)  -  cl , 


dw 

cT£ 


—  X(u2  —  1 )w  —  u  +  r(t)  ♦  v  ( t ) ,  w(0) 


This,  in  turn,  is  converted  into  tie  system  of  recurrence 
relations 


(5) 


UK  +  1  -  Uk  +  V'  U0  "  C1  ' 


w 


r 


k+1 


wk  +  X(uk  -  1)wk  -  uk 


■f  r,  +  v. 
k  k 


W0  “  2  2 


Let  us  assume  that  sequence  (r(,  is  a  sequence  of  inde¬ 
pendent  random  variables  with  a  common  olstriiutlon  function 
dG(r).  W?  shall  consider  the  problem  of  correlation  below. 
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It  Is  desired  to  choose  the  sequence  of  values, 
subject  to  the  restriction 


(6) 


-  b  <  vk  <  b* 


so  as  to  minimize  the  expected  value  of 

N— 1 


(7) 


Ja((vk')  ■  A  2  \  ♦  !un!  *  a  -  0,  1 ,  2, 

k  »a 


Set 


(8) 


fa^cl 'c2^ 


Min  Exp  l( { v  !  ), 
P  r  ^  kj 


for  a  »  0,  1,  2,  N  -  1,  -  od  <  ,c?  <  oo 

Then 


O) 


r  2  1 

rN-]^cl*C2^  "  Mln  Sxp  ci  ♦  vi^i  j 

VN— 1  rN— 1 


where  u^t  =  c,  +  A-'p.  Hence 


(10)  rfj_i  ( c i » c 2 )  “  Acl  *  ci  + 


The  equation  of  ( h .  ' )  becomes 


(11) 


^3  ( c 1  ,C?) 


Min  Exp  ♦  ^c2»  cj 


v  r 

a  a 


or 


r 


O?) 


’c2^ 


Min  A +■  ®  p  4-  A i  (• 

,,  iQ  i  V  a+r  i  "2 7  c 


— m 


,  N  -  1. 


+  Ah ( c^ ,Cp,r 


2  +  Ah  )d0(r) 
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where 

(13)  h^cl»c2»ra'va^  "  ra  *  va  ~  C1  ~  X^cl  ~  1)c2* 


Hie  minimization  with  respect  to  Is  over  the  Integral 

dt 

~  b  <  va  <  b  * 

We  have  thus  reduced  the  solution  of  the  problem  to  the 
computation  of  the  sequence  of  functions  of  two  variables. 


6.  Discussion 

Let  us  now  discuss  In  more  detail  to  what  the  algorithm 
presented  In  the  previous  section  Is  feasible.  The  concept  of 
feasibility  Is  completely  dependent  upon  the  type  of  computer 
available.  We  shall  think  In  terms  of  a  modem  high  speed 
digital  computer.  As  far  as  hand  computation  Is  concerned,  the 
method  outlined  above  Is  definitely  not  feasible. 

To  carry  out  the  determination  of  f^c^.c^),  we  must 

store  the  values  of  fa+l (°i »cg)  *n  comPuter,  In  one  form 

or  another,  evaluate  the  Integral  over  r  appearing  In  (5.12), 

and  then  minimize  over  v  . 

a 

Let  us  discuss  these  operations  In  turn.  When  we  speak  of 
storing  the  values  of  (ci »c2^  *n  computer,  we  mean 

that  we  must  have  a  method  for  producing  the  value  of 
fa+i(ci»c2)  at  any  particular  point  (c^c^)  that  Is  desired. 
Hiere  are  two  ways  of  accomplishing  this.  In  the  first  place, 
we  can  agree  that  we  are  Interested  only  In  the  points  within 
some  square  —  s  <  c^,Cp  <  s,  and  then  only  In  the  values  of 
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tine  function  at  a  finite  set  of  grid-points  (mB,n6), 
m,n  »  —  M,  -  M  ♦  1,  . ..,  M,  where  M6  -  3.  If  (c^Cg)  Is 
not  a  grid-point,  the  value  of  *c2^  ^etemined  by 

ar  Interpolation  formula. 

It  follows  then  that  storing  the  values  of  the  function 

p 

fa+1(c1,Cg)  Is  equivalent  to  storing  (l  ♦  2M)  numbers,  the 
values  at  (mB,n6).  If  M  •  50,  not  a  particularly  fine  sub¬ 
division  If  c.  and  Cg  are  large,  we  require  approximately 
4 

10  values.  This  is  a  considerable  quantity,  when  we  realize 
that  It  must  be  multiplied  by  3,  to  take  account  of  the 
storage  of  the  values  of  the  rew  function,  **a^cl,c2^*  the 

rollcy  function  vfl  -  va(c^,Cg). 

Problems  of  this  magnitude,  however,  can  be  treated  with 
the  largest  of  current  digital  computers,  and  will  be  routine 
in  a  few  years  with  the  much  larger  machines  being  built  at  the 
present . 

It  Is  clear,  nonetheless,  the  storage  of  functions  of  many 
variables  cannot  be  accomplished  along  the  crude  lines  described 
above.  Any  further  discussion  would  take  us  too  far  afield. 

The  Interested  reader  may  consult  [8]  for  a  brief  sketch  of  an 
entirely  different  approach. 

Turn  now  to  the  problem  of  evaluation  of  the  Integral  In 
(5. IP).  Since  these  studies  are  all  of  preliminary  nature, 
it  Is  wise  to  assume  quite  simple  random  effects.  Hence  if  r 
Is  taken  to  assume  the  values  ♦  k  with  equal  probability,  the 
expression  in  (5.12)  becomes 
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(1)  7  [fa+l^cl  4  Ac2’  c2  4  ^*tci»c2k»va^ 

4  fa+l^cl  4  ^c2’  c2  4  Ah(ci»c2»”k»va^]  * 

There  Is  thus  no  difficulty  In  thl3  evaluation. 

Finally,  consider  the  problem  of  determining  the  minimum 
over  vfl.  For  a  variety  of  reasons,  we  do  not  wish  to  follow 
any  conventional  lines  Involving  the  use  of  derivatives. 

Hence,  we  choose  a  grid  in  the  v  — Interval,  say 

a 

v&  •  —  q6^,  —  (q  -  l)6j,  ....  q&1  »  b,  and  minimize  only  over 
the  discrete  set  of  values  ♦  16^.  To  do  this,  we  need  only 
compare  numerical  values  at  these  points.  If  further  accuracy 
Is  desired,  interpolation  can  again  be  used. 

A  very  important  aspect  of  this  direct  minimization  is 
that  the  presence  of  constraints  aids  rather  than  hurts.  Hie 
more  constraints,  the  smaller  the  allowable  choices  of  v„  and 
the  more  rapid  the  numerical  search.  In  particular,  the 
simplest  case  is  that  which  is  occasionally  called  "bang— hang" 

control,  cf.  [lOj ,  where  v  is  allowed  to  assume  only  the 

6 

values  ♦  b. 

7.  Minimum  of  Maximum  Deviation 

So  far,  we  have  teen  considering  variational  problems  of 
fairly  conventional  type.  Using  the  same  second-order  equation 
as  in  §5,  let  us  consider  the  problem  of  determining  v(t)  so 
as  to  minimize  the  probability  that 
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(1)  Max  |uj  >  d. 

0<t<T 

*nie  discrete  version  requires  uc  to  minimize  the  probability 

that 

r  *i 

(2)  Max  jJuQ|f  !  uT  | ,  ....  >  dj  . 

The  observation  that 

r-  “1 

(3)  Max  Mu0l,  |u1l,  ! '»^N_1  Ij 

r  n 

-  Max  |u0!  ,  Max  |JUj  !  ,  ...»  |uJi_1 1 

L  m 

permits  us  to  employ  the  principle  of  optimality  in  very  much 
the  same  way  as  before. 

Introduce  the  sequence  of  functions 

(M  fa(cltc2)  -  Min  Prob  |msx  [l«a I  #  |ua<fl | , . . .  #  (u^  |J  >  dj, 

for  a  «  0,  1,  2,  N  —  1,  and  —  od  <  c1#c2  <  oo  . 

Then 

(5)  fjj _ ^(c^(c2)  •  1,  ICj!  >  d, 

-  0,  I ca 1  <  d, 

and 

(6)  fa^cl»c2^  *  1»  l°i'  1  a» 

-  Min^®  fa4l(ca  ♦  c2^*c2  4  hA)dO(r),  1  c,  |  <  a, 

va  “® 
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a  ■  0,  1,  2,  N  —  2. 


8.  Correlation 

-**■  \is  now  indicate  how  processes  where  the  r^  are  not 
independent  random  variables  may  be  treated.  The  simplest  of 
these  is  that  the  distribution  of  depends  only  upon  the 

value  of  rj_i • 

In  this  case,  it  is  clear  that  an  essential  Dart  of  the 
information  pattern  at  each  stage  is  the  value  of  r  at  the 
preceding  state.  Let  us  define 

(1)  dOfr^jr^^)  ■  the  dictril  ution  function  of  r^ 

given  the  value  of  r^_j , 

and  returning  to  the  model  of  ^5, 

(2)  ^a^  C1  ,c2;  ra-l  ^  “  the  mlnlmulT1  expected  deviation 

starting  at  time  a  in  the  state 
(c^,c^)  and  the  information  that 
r  at  a  —  1  was  r 


It  is  easy  then  to  see  that  the  recurrence  relation  now 
has  the  form 


fa (C1 ’ r2:ra-l ^ 


-  Min  jA  T  ♦*,  ra+l^cl'fAr:2'c2*AV  )uG^ra:ra- 


a  - 


-CD 


O) 
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